Adversarial Inverse Optimal Control for General Imitation Learning Losses and Embodiment Transfer
نویسندگان
چکیده
We develop a general framework for inverse optimal control that distinguishes between rationalizing demonstrated behavior and imitating inductively inferred behavior. This enables learning for more general imitative evaluation measures and differences between the capabilities of the demonstrator and those of the learner (i.e., differences in embodiment). Our formulation takes the form of a zero-sum game between a learner attempting to minimize an imitative loss measure, and an adversary attempting to maximize the loss by approximating the demonstrated examples in limited ways. We establish the consistency and generalization guarantees of this approach and illustrate its benefits on real and synthetic imitation learning tasks.
منابع مشابه
Generative Adversarial Imitation Learning
Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert’s cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a...
متن کاملA Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
Generative adversarial networks (GANs) are a recently proposed class of generative models in which a generator is trained to optimize a cost function that is being simultaneously learned by a discriminator. While the idea of learning cost functions is relatively new to the field of generative modeling, learning costs has long been studied in control and reinforcement learning (RL) domains, typi...
متن کاملInverse Optimal Heuristic Control for Imitation Learning
One common approach to imitation learning is behavioral cloning (BC), which employs straightforward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains obse...
متن کاملModel-based Adversarial Imitation Learning
Generative adversarial learning is a popular new approach to training generative models which has been proven successful for other related problems as well. The general idea is to maintain an oracle D that discriminates between the expert’s data distribution and that of the generative model G. The generative model is trained to capture the expert’s distribution by maximizing the probability of ...
متن کاملLearning Robust Rewards with Adversarial Inverse Reinforcement Learning
Reinforcement learning provides a powerful and general framework for decision making and control, but its application in practice is often hindered by the need for extensive feature and reward engineering. Deep reinforcement learning methods can remove the need for explicit engineering of policy or value features, but still require a manually specified reward function. Inverse reinforcement lea...
متن کامل